derivative observation
Scaling Gaussian Processes with Derivative Information Using Variational Inference
Gaussian processes with derivative information are useful in many settings where derivative information is available, including numerous Bayesian optimization and regression tasks that arise in the natural sciences. Incorporating derivative observations, however, comes with a dominating O(N3D3) computational cost when training on N points in D input dimensions. This is intractable for even moderately sized problems. While recent work has addressed this intractability in the low-Dsetting, the high-N, high-Dsetting is still unexplored and of great value, particularly as machine learning problems increasingly become high dimensional. In this paper, we introduce methods to achieve fully scalable Gaussian process regression with derivatives using variational inference. Analogous to the use of inducing values to sparsify the labels of a training set, we introduce the concept of inducing directional derivatives to sparsify the partial derivative information of a training set. This enables us to construct a variational posterior that incorporates derivative information but whose size depends neither on the full dataset size N nor the full dimensionality D. We demonstrate the full scalability of our approach on a variety of tasks, ranging from a high dimensional stellarator fusion regression task to training graph convolutional neural networks on Pubmed using Bayesian optimization. Surprisingly, we find that our approach can improve regression performance even in settings where only label data is available.
Derivative Observations in Gaussian Process Models of Dynamic Systems
Gaussian processes provide an approach to nonparametric modelling which allows a straightforward combination of function and derivative observations in an empirical model. This is of particular importance in identification of nonlinear dynamic systems from experimental data. This derivative information can be in the form of priors specified by an expert or identified from perturbation data close to equilibrium.
Scaling Gaussian Processes with Derivative Information Using Variational Inference
Padidar, Misha, Zhu, Xinran, Huang, Leo, Gardner, Jacob R., Bindel, David
Gaussian processes with derivative information are useful in many settings where derivative information is available, including numerous Bayesian optimization and regression tasks that arise in the natural sciences. Incorporating derivative observations, however, comes with a dominating $O(N^3D^3)$ computational cost when training on $N$ points in $D$ input dimensions. This is intractable for even moderately sized problems. While recent work has addressed this intractability in the low-$D$ setting, the high-$N$, high-$D$ setting is still unexplored and of great value, particularly as machine learning problems increasingly become high dimensional. In this paper, we introduce methods to achieve fully scalable Gaussian process regression with derivatives using variational inference. Analogous to the use of inducing values to sparsify the labels of a training set, we introduce the concept of inducing directional derivatives to sparsify the partial derivative information of a training set. This enables us to construct a variational posterior that incorporates derivative information but whose size depends neither on the full dataset size $N$ nor the full dimensionality $D$. We demonstrate the full scalability of our approach on a variety of tasks, ranging from a high dimensional stellarator fusion regression task to training graph convolutional neural networks on Pubmed using Bayesian optimization. Surprisingly, we find that our approach can improve regression performance even in settings where only label data is available.
Scalable Bayesian Optimization with Sparse Gaussian Process Models
Bayesian optimization forms a set of powerful tools that allows efficient black-box optimization and has been applied in a large variety of fields. In this thesis we first seek to advance Bayesian optimization by using estimated derivative observations. Later, we seek to tackle down the issues in Bayesian optimization when a large number of derivative observations and/or function observations are present. We start to describe our motivations in Chapter 1. We then give a broad review of Bayesian optimization in Chapter 2, where we start by covering the history of Bayesian optimization and its components.
Correcting boundary over-exploration deficiencies in Bayesian optimization with virtual derivative sign observations
Siivola, Eero, Vehtari, Aki, Vanhatalo, Jarno, González, Javier
Bayesian optimization (\bo) is a global optimization strategy designed to find the minimum of an expensive black-box function, typically defined on a continuous subset of $\mathcal{R}^d$, by using a Gaussian process (\gp) as a surrogate model for the objective. Although currently available acquisition functions address this goal with different degree of success, an over-exploration effect of the contour of the search space is typically observed. However, in problems like the configuration of machine learning algorithms, the function domain is conservatively large and with a high probability the global minimum does not sit the boundary. We propose a method to incorporate this knowledge into the searching process by adding virtual derivative observations in the \gp at the borders of the search space. We use the properties of \gps to impose conditions on the partial derivatives of the objective. The method is applicable with any acquisition function, it is easy to use and consistently reduces the number of evaluations required to optimize the objective irrespective of the acquisition used. We illustrate the benefits our approach in an extensive experimental comparison.